-
Notifications
You must be signed in to change notification settings - Fork 7.1k
Add wanda anyscale image builds for release tests #59937
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: andrew/revup/master/ray-image
Are you sure you want to change the base?
Add wanda anyscale image builds for release tests #59937
Conversation
|
f104bf5 to
288cadb
Compare
f963b42 to
0710971
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors the Anyscale image build process to use Wanda for building and a new Python script for pushing images to multiple registries. The changes include new Wanda configuration files, the push_anyscale_image.py script, and updates to the Buildkite pipeline. My review identifies a few critical issues that will break the build, including a missing file in a Bazel target and an incomplete matrix configuration in the pipeline. I've also provided several suggestions to improve maintainability by reducing code duplication and making parts of the new script less brittle.
| py_binary( | ||
| name = "push_ray_image", | ||
| srcs = ["push_ray_image.py"], | ||
| exec_compatible_with = ["//bazel:py3"], | ||
| deps = [ | ||
| ":crane_lib", | ||
| "//ci/ray_ci:ray_ci_lib", | ||
| ci_require("click"), | ||
| ], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| - bash release/gcloud_docker_login.sh release/aws2gce_iam.json | ||
| - bash release/azure_docker_login.sh | ||
| - az acr login --name rayreleasetest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These authentication commands are duplicated across the anyscalebuild, anyscalellmbuild, and anyscalemlbuild push jobs. To improve maintainability and reduce redundancy, consider using YAML anchors and aliases to define these commands once and reuse them in each job. This will make future updates to the authentication process much easier.
|
|
||
| # GPU_PLATFORM is the default GPU platform that gets aliased as "gpu" | ||
| # This must match the definition in ci/ray_ci/docker_container.py | ||
| GPU_PLATFORM = "cu12.1.1-cudnn8" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The GPU_PLATFORM is hardcoded here. The comment mentions it must match a definition in another file, which makes this brittle. If the default GPU platform changes, this script will require a manual update, which can be easily missed. Consider sourcing this value from a shared configuration file or passing it as a command-line argument to make the script more robust and maintainable.
| def _format_platform_tag(platform: str) -> str: | ||
| """Format platform as -cpu or shortened CUDA version.""" | ||
| if platform == "cpu": | ||
| return "-cpu" | ||
| # cu12.3.2-cudnn9 -> -cu123 | ||
| versions = platform.split(".") | ||
| return f"-{versions[0]}{versions[1]}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic for parsing the CUDA version string using platform.split('.') is fragile. It assumes a specific format with dots as separators and might fail with an IndexError if the platform string format changes (e.g., cu12-cudnn8). Using a regular expression would be more robust for extracting the version parts. For example: re.match(r"cu(\\d+)\\.(\\d+)", platform).
| def _get_image_tags(python_version: str, platform: str) -> List[str]: | ||
| """ | ||
| Generate image tags matching the original docker_container.py format. | ||
| Returns multiple tags for the image (canonical + aliases). | ||
| For GPU_PLATFORM, also generates -gpu alias tags to match release test expectations. | ||
| """ | ||
| branch = os.environ.get("BUILDKITE_BRANCH", "") | ||
| commit = os.environ.get("BUILDKITE_COMMIT", "")[:6] | ||
| rayci_build_id = os.environ.get("RAYCI_BUILD_ID", "") | ||
|
|
||
| py_tag = _format_python_version_tag(python_version) | ||
| platform_tag = _format_platform_tag(platform) | ||
|
|
||
| # For GPU_PLATFORM, also create -gpu alias (release tests use type: gpu) | ||
| platform_tags = [platform_tag] | ||
| if platform == GPU_PLATFORM: | ||
| platform_tags.append("-gpu") | ||
|
|
||
| tags = [] | ||
|
|
||
| if branch == "master": | ||
| # On master, use sha and build_id as tags | ||
| for ptag in platform_tags: | ||
| tags.append(f"{commit}{py_tag}{ptag}") | ||
| if rayci_build_id: | ||
| for ptag in platform_tags: | ||
| tags.append(f"{rayci_build_id}{py_tag}{ptag}") | ||
| elif branch.startswith("releases/"): | ||
| # On release branches, use release name | ||
| release_name = branch[len("releases/") :] | ||
| for ptag in platform_tags: | ||
| tags.append(f"{release_name}.{commit}{py_tag}{ptag}") | ||
| if rayci_build_id: | ||
| for ptag in platform_tags: | ||
| tags.append(f"{rayci_build_id}{py_tag}{ptag}") | ||
| else: | ||
| # For other branches (PRs, etc.) | ||
| pr = os.environ.get("BUILDKITE_PULL_REQUEST", "false") | ||
| if pr != "false": | ||
| for ptag in platform_tags: | ||
| tags.append(f"pr-{pr}.{commit}{py_tag}{ptag}") | ||
| else: | ||
| for ptag in platform_tags: | ||
| tags.append(f"{commit}{py_tag}{ptag}") | ||
| if rayci_build_id: | ||
| for ptag in platform_tags: | ||
| tags.append(f"{rayci_build_id}{py_tag}{ptag}") | ||
|
|
||
| return tags |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function has a lot of repeated code, especially the block for appending rayci_build_id tags, which is identical in all if/elif/else branches. This can be refactored to be more DRY (Don't Repeat Yourself), which improves readability and maintainability. You could determine the tag prefix first, then generate the tags in a single block of code.
def _get_image_tags(python_version: str, platform: str) -> List[str]:
"""
Generate image tags matching the original docker_container.py format.
Returns multiple tags for the image (canonical + aliases).
For GPU_PLATFORM, also generates -gpu alias tags to match release test expectations.
"""
branch = os.environ.get("BUILDKITE_BRANCH", "")
commit = os.environ.get("BUILDKITE_COMMIT", "")[:6]
rayci_build_id = os.environ.get("RAYCI_BUILD_ID", "")
py_tag = _format_python_version_tag(python_version)
platform_tag = _format_platform_tag(platform)
# For GPU_PLATFORM, also create -gpu alias (release tests use type: gpu)
platform_tags = [platform_tag]
if platform == GPU_PLATFORM:
platform_tags.append("-gpu")
tags = []
if branch == "master":
prefix = commit
elif branch.startswith("releases/"):
release_name = branch[len("releases/") :]
prefix = f"{release_name}.{commit}"
else:
pr = os.environ.get("BUILDKITE_PULL_REQUEST", "false")
if pr != "false":
prefix = f"pr-{pr}.{commit}"
else:
prefix = commit
for ptag in platform_tags:
tags.append(f"{prefix}{py_tag}{ptag}")
if rayci_build_id:
for ptag in platform_tags:
tags.append(f"{rayci_build_id}{py_tag}{ptag}")
return tags0710971 to
4cdb2dd
Compare
46cc862 to
70df756
Compare
4cdb2dd to
f65669a
Compare
70df756 to
ae32d45
Compare
f65669a to
1141897
Compare
ae32d45 to
04cfd26
Compare
af3b600 to
6bbeb9e
Compare
04cfd26 to
55d1473
Compare
6bbeb9e to
1f049e8
Compare
55d1473 to
84d7479
Compare
84d7479 to
c275fe2
Compare
d43ccd7 to
8980d9c
Compare
7c78fa2 to
29cf5ab
Compare
8980d9c to
4a143b7
Compare
29cf5ab to
d566c8a
Compare
4a143b7 to
9243672
Compare
d566c8a to
9190029
Compare
79224a6 to
3123fe2
Compare
dd5133b to
ae00964
Compare
a57ba93 to
abd456d
Compare
ae00964 to
021b2e1
Compare
abd456d to
2dc9e7b
Compare
021b2e1 to
52b7b21
Compare
2dc9e7b to
5e859ee
Compare
52b7b21 to
8cfa977
Compare
5e859ee to
dc8bd48
Compare
8cfa977 to
924d46f
Compare
dc8bd48 to
fc711fb
Compare
924d46f to
275827a
Compare
fc711fb to
acdfe07
Compare
04d6806 to
6e985e9
Compare
acdfe07 to
50ffff8
Compare
6e985e9 to
17d793f
Compare
50ffff8 to
b07e139
Compare
17d793f to
716cc65
Compare
b07e139 to
77436f8
Compare
716cc65 to
976be9f
Compare
77436f8 to
71b04f1
Compare
This adds wanda-based builds for anyscale images used in release tests. Changes: - Add ray-anyscale-cpu/cuda.wanda.yaml for ray anyscale images - Add ray-llm-anyscale-cuda.wanda.yaml for LLM images - Add ray-ml-anyscale-cuda.wanda.yaml for ML images - Add push_anyscale_image.py for pushing to ECR/GCP/Azure via crane - Update release/build.rayci.yml with build and push steps - Update gcloud_docker_login.sh to v550.0.0 with arch detection Topic: anyscale-image Relative: ray-image Labels: draft Signed-off-by: andrew <[email protected]>
976be9f to
f4e1306
Compare
71b04f1 to
ebae2c5
Compare
f4e1306 to
73da838
Compare
ebae2c5 to
96eb30f
Compare
This adds wanda-based builds for anyscale images used in release tests.
Changes:
Topic: anyscale-image
Relative: ray-image
Labels: draft
Signed-off-by: andrew [email protected]